{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fast Bayesian estimation of SARIMAX models\n",
    "\n",
    "### Introduction\n",
    "\n",
    "This notebook will show how to use fast Bayesian methods to estimate SARIMAX (Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors) models. These methods can also be parallelised across multiple cores.\n",
    "\n",
    "Here, fast methods means a version of Hamiltonian Monte Carlo called the No-U-Turn Sampler (NUTS) developed by Hoffmann and Gelman: see [Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593-1623.](https://arxiv.org/abs/1111.4246). As they say, \"the cost of HMC per independent sample from a target distribution of dimension $D$ is roughly $\\mathcal{O}(D^{5/4})$, which stands in sharp contrast with the $\\mathcal{O}(D^{2})$ cost of random-walk Metropolis\". So for problems of larger dimension, the time-saving with HMC is significant. However it does require the gradient, or Jacobian, of the model to be provided.\n",
    "\n",
    "This notebook will combine the Python libraries [statsmodels](https://www.statsmodels.org/stable/index.html), which does econometrics, and [PyMC3](https://docs.pymc.io/), which is for Bayesian estimation, to perform fast Bayesian estimation of a simple SARIMAX model, in this case an ARMA(1, 1) model for US CPI.\n",
    "\n",
    "Note that, for simple models like AR(p), base PyMC3 is a quicker way to fit a model; there's an [example here](https://docs.pymc.io/notebooks/AR.html). The advantage of using statsmodels is that it gives access to methods that can solve a vast range of statespace models.\n",
    "\n",
    "The model we'll solve is given by\n",
    "$$\n",
    "y_t = \\phi y_{t-1} + \\varepsilon_t + \\theta_1 \\varepsilon_{t-1}, \\qquad \\varepsilon_t \\sim N(0, \\sigma^2)\n",
    "$$\n",
    "with 1 auto-regressive term and 1 moving average term. In statespace form it is written as:\n",
    "$$\n",
    "\\begin{align}\n",
    "y_t & = \\underbrace{\\begin{bmatrix} 1 & \\theta_1 \\end{bmatrix}}_{Z} \\underbrace{\\begin{bmatrix} \\alpha_{1,t} \\\\ \\alpha_{2,t} \\end{bmatrix}}_{\\alpha_t} \\\\\n",
    "    \\begin{bmatrix} \\alpha_{1,t+1} \\\\ \\alpha_{2,t+1} \\end{bmatrix} & = \\underbrace{\\begin{bmatrix}\n",
    "        \\phi & 0 \\\\\n",
    "        1      & 0     \\\\\n",
    "    \\end{bmatrix}}_{T} \\begin{bmatrix} \\alpha_{1,t} \\\\ \\alpha_{2,t} \\end{bmatrix} +\n",
    "    \\underbrace{\\begin{bmatrix} 1 \\\\ 0 \\end{bmatrix}}_{R} \\underbrace{\\varepsilon_{t+1}}_{\\eta_t} \\\\\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "The code will follow these steps:\n",
    "1. Import external dependencies\n",
    "2. Download and plot the data on US CPI\n",
    "3. Simple maximum likelihood estimation (MLE) as an example\n",
    "4. Definitions of helper functions to provide tensors to the library doing Bayesian estimation\n",
    "5. Bayesian estimation via NUTS\n",
    "6. Application to US CPI series\n",
    "\n",
    "Finally, Appendix A shows how to re-use the helper functions from step (4) to estimate a different state space model, `UnobservedComponents`, using the same Bayesian methods.\n",
    "\n",
    "#### 1. Import external dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import theano\n",
    "import theano.tensor as tt\n",
    "import pymc3 as pm\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import statsmodels.api as sm\n",
    "import pandas as pd\n",
    "from pandas_datareader.data import DataReader\n",
    "from pandas.plotting import register_matplotlib_converters\n",
    "plt.style.use('seaborn')\n",
    "register_matplotlib_converters()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. Download and plot the data on US CPI\n",
    "\n",
    "We'll get the data from FRED:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cpi = DataReader('CPIAUCNS', 'fred', start='1971-01', end='2018-12')\n",
    "cpi.index = pd.DatetimeIndex(cpi.index, freq='MS')\n",
    "\n",
    "# Define the inflation series that we'll use in analysis\n",
    "inf = np.log(cpi).resample('QS').mean().diff()[1:] * 400\n",
    "print(inf.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot the series \n",
    "fig, ax = plt.subplots(figsize=(9, 4), dpi=300)\n",
    "ax.plot(inf.index, inf, label=r'$\\Delta \\log CPI$', lw=2)\n",
    "ax.legend(loc='lower left')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Fit the model with maximum likelihood\n",
    "\n",
    "Statsmodels does all of the hardwork of this for us - creating and fitting the model takes just two lines of code. The model order parameters correspond to auto-regressive, difference, and moving average orders respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an SARIMAX model instance - here we use it to estimate\n",
    "# the parameters via MLE using the `fit` method, but we can\n",
    "# also re-use it below for the Bayesian estimation\n",
    "mod = sm.tsa.statespace.SARIMAX(inf, order=(1, 0, 1))\n",
    "\n",
    "res_mle = mod.fit(disp=False)\n",
    "print(res_mle.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's a good fit. We can also get the series of one-step ahead predictions and plot it next to the actual data, along with a confidence band.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predict_mle = res_mle.get_prediction()\n",
    "predict_mle_ci = predict_mle.conf_int()\n",
    "lower = predict_mle_ci['lower CPIAUCNS']\n",
    "upper = predict_mle_ci['upper CPIAUCNS']\n",
    "\n",
    "# Graph\n",
    "fig, ax = plt.subplots(figsize=(9,4), dpi=300)\n",
    "\n",
    "# Plot data points\n",
    "inf.plot(ax=ax, style='-', label='Observed')\n",
    "\n",
    "# Plot predictions\n",
    "predict_mle.predicted_mean.plot(ax=ax, style='r.', label='One-step-ahead forecast')\n",
    "ax.fill_between(predict_mle_ci.index, lower, upper, color='r', alpha=0.1)\n",
    "ax.legend(loc='lower left')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. Helper functions to provide tensors to the library doing Bayesian estimation\n",
    "\n",
    "We're almost on to the magic but there are a few preliminaries. Feel free to skip this section if you're not interested in the technical details.\n",
    "\n",
    "---------\n",
    "\n",
    "##### Technical section\n",
    "\n",
    "PyMC3 is a Bayesian estimation library (\"Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano\") that is a) fast and b) optimised for Bayesian machine learning, for instance [Bayesian neural networks](https://docs.pymc.io/notebooks/bayesian_neural_network_advi.html). To do all of this, it is built on top of a Theano, a library that aims to evaluate tensors very efficiently and provide symbolic differentiation (necessary for any kind of deep learning). It is the symbolic differentiation that means PyMC3 can use NUTS on any problem formulated within PyMC3.\n",
    "\n",
    "We are not formulating a problem directly in PyMC3; we're using statsmodels to specify the statespace model and solve it with the Kalman filter. So we need to put the plumbing of statsmodels and PyMC3 together, which means wrapping the statsmodels SARIMAX model object in a Theano-flavoured wrapper before passing information to PyMC3 for estimation.\n",
    "\n",
    "Because of this, we can't use the Theano auto-differentiation directly. Happily, statsmodels SARIMAX objects have a method to return the Jacobian evaluated at the parameter values. We'll be making use of this to provide gradients so that we can use NUTS."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### Defining helper functions to translate models into a PyMC3 friendly form\n",
    "\n",
    "First, we'll create the Theano wrappers. They will be in the form of 'Ops', operation objects, that 'perform' particular tasks. They are initialised with a statsmodels `model` instance.\n",
    "\n",
    "Although this code may look somewhat opaque, it is generic for any state space model in statsmodels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Loglike(tt.Op):\n",
    "\n",
    "    itypes = [tt.dvector] # expects a vector of parameter values when called\n",
    "    otypes = [tt.dscalar] # outputs a single scalar value (the log likelihood)\n",
    "\n",
    "    def __init__(self, model):\n",
    "        self.model = model\n",
    "        self.score = Score(self.model)\n",
    "\n",
    "    def perform(self, node, inputs, outputs):\n",
    "        theta, = inputs  # contains the vector of parameters\n",
    "        llf = self.model.loglike(theta)\n",
    "        outputs[0][0] = np.array(llf) # output the log-likelihood\n",
    "\n",
    "    def grad(self, inputs, g):\n",
    "        # the method that calculates the gradients - it actually returns the\n",
    "        # vector-Jacobian product - g[0] is a vector of parameter values\n",
    "        theta, = inputs  # our parameters\n",
    "        out = [g[0] * self.score(theta)]\n",
    "        return out\n",
    "\n",
    "    \n",
    "class Score(tt.Op):\n",
    "    itypes = [tt.dvector]\n",
    "    otypes = [tt.dvector]\n",
    "\n",
    "    def __init__(self, model):\n",
    "        self.model = model\n",
    "\n",
    "    def perform(self, node, inputs, outputs):\n",
    "        theta, = inputs\n",
    "        outputs[0][0] = self.model.score(theta)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### End of technical section\n",
    "\n",
    "---------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5. Bayesian estimation with NUTS\n",
    "\n",
    "The next step is to set the parameters for the Bayesian estimation, specify our priors, and run it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set sampling params\n",
    "ndraws = 3000  # number of draws from the distribution\n",
    "nburn = 600   # number of \"burn-in points\" (which will be discarded)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now for the fun part! There are three parameters to estimate: $\\phi$, $\\theta_1$, and $\\sigma$. We'll use uninformative uniform priors for the first two, and an inverse gamma for the last one. Then we'll run the inference optionally using as many computer cores as I have."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Construct an instance of the Theano wrapper defined above, which\n",
    "# will allow PyMC3 to compute the likelihood and Jacobian in a way\n",
    "# that it can make use of. Here we are using the same model instance\n",
    "# created earlier for MLE analysis (we could also create a new model\n",
    "# instance if we preferred)\n",
    "loglike = Loglike(mod)\n",
    "\n",
    "with pm.Model():\n",
    "    # Priors\n",
    "    arL1 = pm.Uniform('ar.L1', -0.99, 0.99)\n",
    "    maL1 = pm.Uniform('ma.L1', -0.99, 0.99)\n",
    "    sigma2 = pm.InverseGamma('sigma2', 2, 4)\n",
    "\n",
    "    # convert variables to tensor vectors\n",
    "    theta = tt.as_tensor_variable([arL1, maL1, sigma2])\n",
    "\n",
    "    # use a DensityDist (use a lamdba function to \"call\" the Op)\n",
    "    pm.DensityDist('likelihood', lambda v: loglike(v), observed={'v': theta})\n",
    "    \n",
    "    # Draw samples\n",
    "    trace = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True, cores=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the NUTS sampler is auto-assigned because we provided gradients. PyMC3 will use Metropolis or Slicing samplers if it doesn't find that gradients are available. There are an impressive number of draws per second for a \"block box\" style computation! However, note that if the model can be represented directly by PyMC3 (like the AR(p) models mentioned above), then computation can be substantially faster.\n",
    "\n",
    "Inference is complete, but are the results any good? There are a number of ways to check. The first is to look at the posterior distributions (with lines showing the MLE values):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.tight_layout()\n",
    "# Note: the syntax here for the lines argument is required for\n",
    "# PyMC3 versions >= 3.7\n",
    "# For version <= 3.6 you can use lines=dict(res_mle.params) instead\n",
    "_ = pm.traceplot(trace,\n",
    "                 lines=[(k, {}, [v]) for k, v in dict(res_mle.params).items()],\n",
    "                 combined=True,\n",
    "                 figsize=(12, 12))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The estimated posteriors clearly peak close to the parameters found by MLE. We can also see a summary of the estimated values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pm.summary(trace)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here Rhat, or $\\hat{R}$, is the Gelman-Rubin statistic. It tests for lack of convergence by comparing the variance between multiple chains to the variance within each chain. If convergence has been achieved, the between-chain and within-chain variances should be identical. If $\\hat{R}<1.2$ for all model parameters, we can have some confidence that convergence has been reached.\n",
    "\n",
    "Additionally, the highest posterior density interval (the gap between the two values of HPD in the table) is small for each of the variables.\n",
    "\n",
    "#### 6. Application of Bayesian estimates of parameters\n",
    "\n",
    "We'll now re-instigate a version of the model but using the parameters from the Bayesian estimation, and again plot the one-step-ahead forecasts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Retrieve the posterior means\n",
    "params = pm.summary(trace)['mean'].values\n",
    "\n",
    "# Construct results using these posterior means as parameter values\n",
    "res_bayes = mod.smooth(params)\n",
    "\n",
    "predict_bayes = res_bayes.get_prediction()\n",
    "predict_bayes_ci = predict_bayes.conf_int()\n",
    "lower = predict_bayes_ci['lower CPIAUCNS']\n",
    "upper = predict_bayes_ci['upper CPIAUCNS']\n",
    "\n",
    "# Graph\n",
    "fig, ax = plt.subplots(figsize=(9,4), dpi=300)\n",
    "\n",
    "# Plot data points\n",
    "inf.plot(ax=ax, style='-', label='Observed')\n",
    "\n",
    "# Plot predictions\n",
    "predict_bayes.predicted_mean.plot(ax=ax, style='r.', label='One-step-ahead forecast')\n",
    "ax.fill_between(predict_bayes_ci.index, lower, upper, color='r', alpha=0.1)\n",
    "ax.legend(loc='lower left')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Appendix A. Application to `UnobservedComponents` models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can reuse the `Loglike` and `Score` wrappers defined above to consider a different state space model. For example, we might want to model inflation as the combination of a random walk trend and autoregressive error term:\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "y_t & = \\mu_t + \\varepsilon_t \\\\\n",
    "\\mu_t & = \\mu_{t-1} + \\eta_t \\\\\n",
    "\\varepsilon_t &= \\phi \\varepsilon_t + \\zeta_t\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "This model can be constructed in Statsmodels with the `UnobservedComponents` class using the `rwalk` and `autoregressive` specifications. As before, we can fit the model using maximum likelihood via the `fit` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Construct the model instance\n",
    "mod_uc = sm.tsa.UnobservedComponents(inf, 'rwalk', autoregressive=1)\n",
    "\n",
    "# Fit the model via maximum likelihood\n",
    "res_uc_mle = mod_uc.fit()\n",
    "print(res_uc_mle.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As noted earlier, the Theano wrappers (`Loglike` and `Score`) that we created above are generic, so we can re-use essentially the same code to explore the model with Bayesian methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set sampling params\n",
    "ndraws = 3000  # number of draws from the distribution\n",
    "nburn = 600   # number of \"burn-in points\" (which will be discarded)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here we follow the same procedure as above, but now we instantiate the\n",
    "# Theano wrapper `Loglike` with the UC model instance instead of the\n",
    "# SARIMAX model instance\n",
    "loglike_uc = Loglike(mod_uc)\n",
    "\n",
    "with pm.Model():\n",
    "    # Priors\n",
    "    sigma2level = pm.InverseGamma('sigma2.level', 1, 1)\n",
    "    sigma2ar = pm.InverseGamma('sigma2.ar', 1, 1)\n",
    "    arL1 = pm.Uniform('ar.L1', -0.99, 0.99)\n",
    "\n",
    "    # convert variables to tensor vectors\n",
    "    theta_uc = tt.as_tensor_variable([sigma2level, sigma2ar, arL1])\n",
    "\n",
    "    # use a DensityDist (use a lamdba function to \"call\" the Op)\n",
    "    pm.DensityDist('likelihood', lambda v: loglike_uc(v), observed={'v': theta_uc})\n",
    "    \n",
    "    # Draw samples\n",
    "    trace_uc = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True, cores=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And as before we can plot the marginal posteriors. In contrast to the SARIMAX example, here the posterior modes are somewhat different from the MLE estimates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.tight_layout()\n",
    "# Note: the syntax here for the lines argument is required for\n",
    "# PyMC3 versions >= 3.7\n",
    "# For version <= 3.6 you can use lines=dict(res_mle.params) instead\n",
    "_ = pm.traceplot(trace_uc,\n",
    "                 lines=[(k, {}, [v]) for k, v in dict(res_uc_mle.params).items()],\n",
    "                 combined=True,\n",
    "                 figsize=(12, 12))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pm.summary(trace_uc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Retrieve the posterior means\n",
    "params = pm.summary(trace_uc)['mean'].values\n",
    "\n",
    "# Construct results using these posterior means as parameter values\n",
    "res_uc_bayes = mod_uc.smooth(params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One benefit of this model is that it gives us an estimate of the underling \"level\" of inflation, using the smoothed estimate of $\\mu_t$, which we can access as the \"level\" column in the results objects' `states.smoothed` attribute. In this case, because the Bayesian posterior mean of the level's variance is larger than the MLE estimate, its estimated level is a little more volatile."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Graph\n",
    "fig, ax = plt.subplots(figsize=(9,4), dpi=300)\n",
    "\n",
    "# Plot data points\n",
    "inf['CPIAUCNS'].plot(ax=ax, style='-', label='Observed data')\n",
    "\n",
    "# Plot estimate of the level term\n",
    "res_uc_mle.states.smoothed['level'].plot(ax=ax, label='Smoothed level (MLE)')\n",
    "res_uc_bayes.states.smoothed['level'].plot(ax=ax, label='Smoothed level (Bayesian)')\n",
    "\n",
    "ax.legend(loc='lower left');"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.1 64-bit ('base': conda)",
   "language": "python",
   "name": "python37164bitbaseconda9b5b2a93f2544037b9ba780a6ed404ad"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
